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Chapter 1 

Exact record and order statistics of random walks via 
first-passage ideas 

Gregory Schehr and Satya N. Majumdar 

Laboratoire de Physique Theorique et Modeles Statistiques (LPTMS), 
Univ. Paris- Sud, CNRS, 91405 Orsay Cedex, France 

While records and order statistics of independent and identically dis- 
tributed (i.i.d.) random variables Xi,--- ,Xn are fully understood, 
much less is known for strongly correlated random variables, which is 
often the situation encountered in statistical physics. Recently, it was 
shown, in a series of works, that one-dimensional random walk (RW) is 
an interesting laboratory where the influence of strong correlations on 
records and order statistics can be studied in detail. We review here 
recent exact results which have been obtained for these questions about 
RW, using techniques borrowed from the study of first-passage problems. 
We also present a brief review of the well known (and not so well known) 
results for records and order statistics of i.i.d. variables. 



1. Introduction 

Records and order statistics are by now a longstanding issue in the fields 
of engineering finance^ or environmental sciences^ where extreme events 
might have drastic consequences. Indeed, in these contexts, the statistics of 
extremes have practical applications which include the prediction of proba- 
bility distributions of extreme floods, the amounts of large insurance losses, 
equity risk, the size of freak waves, mutation events during evolution, ex- 
treme statistics of time series, etc. These notions are very popular in our 
societies as, for instance, one always hears and reads, in the media, about 
record breaking events. This is especially true for sports, where world 
records are always special and noteworthyPl 

More recently, it was realized that records and order statistics play a 
crucial role in statistical physics. Hence, there has been a surge of interest 
for these questions in the physics literature. If one considers a discrete 
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time series X\, ■ • • ,Xjy, where X,-'s might represent daily temperatures in 
a given city or the stock prices of a company, a record happens at time fc 
if the fc-th entry is larger than all previous entries X , ■ ■ ■ ,X},_i (see Fig. 
[T]left). One is naturally led to ask the following questions: (a) how many 
records occur in time N ? (b) how long does a record survive ? what is the 
longest or shortest age of a record ? Such questions and related ones have 
found applications in various physical situations ranging from domain wall 
dynamics^ spin-glasses^ and random walks^^ to avalanches,^ models of 
stock price^M^ or the study of global warming! 14 ! 15 ! and also in evolutionary 
biologjIiinS ( see RefES for a recent review). 

Another interesting question about this sequence concerns the fluctu- 
ations of the ordered sequence (so called order statistics) obtained by ar- 
ranging the values of X< by decreasing order of magnitude, M\,n > M 2y N > 
• • • > Mjv.jV) Mk t N being the fc-th maximum of this sequence. Questions re- 
lated to the statistics of the first maximum, A max = Afi jv have emerged in 
various areas of physics ranging from disordered systemg22HH] anc j fluctuat- 
ing interfaces^=H2£J to stochastic processespS random matrices^ and many 
others. While the statistics of the extremum X max is important another 
natural question is: is this extremal value isolated, i.e., far away from the 
others, or is there many other events close to them? Such questions have 
led to the study of the density of states of near-extreme eventsP^SD Order 
statistics is a natural way to characterize this phenomenon of crowding of 
near-extreme events. A set of useful observables that are naturally sen- 
sitive to the crowding of extremum are the gaps between the consecutive 
ordered maxima: dk,N — Mk,N — M^+x,n denoting the fc-th gap. Such 
questions came up in several physical contexts, in particular in the study of 
the branching Brownian motio n 31 ! 32 ! and also for 1//" signal} 3 ^! and more 
recently for random walks pMH 

Records and order statistics of i.i.d. random variables are now per- 
fectly well understood) 1 ! 37 ! 3 ^ and we shall briefly review below the main 
results in this case. The record statistics when the entries Xj's have a non- 
identical distributions but still retain their independence were also studied 
in Ref.p^^ in the so called Linear Drift Model. On the other hand the 
order statistics of weakly correlated random variables reduce, to a large 
extent, to the case of i.i.d. random variables. However, much less is known 
for the difficult case where Xi's are strongly correlated, which turns out to 
be the case of interest in many problems of statistical physics. Recently, 
it was shown that one-dimensional random walk (RW) is a non-trivial in- 
stance of a set of strongly correlated variables for which exact results for 
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records^* 10 * 11 ! and order statistic d 34 * 35 * can be obtained. In this paper, we 
review the main body of these results, which have been obtained, to a large 
extent, by methods and ideas stemming from first passage problems (for a 
review on this topic see^HUl). 

The paper is organized as follows. In section 2, we first focus on records 
statistics while we focus on order statistics in section 3. In each section, we 
first present a brief overview of well known, and not so well know, results 
for i.i.d. random variables. This is then followed by the review of results 
recently obtained for RW. 

2. Record statistics 

2.1. Record statistics of i.i.d. random variables 

We start by a short review on standard results for record statistics of i.i.d. 
random variables. We denote by X\, X2, ■ ■ ■ ,Xjf a collection of N i.i.d. 
random variables, distributed according to a continuous probability density 
function (pdf) p(x). An entry X^ is an upper record if it is larger than all 
previous entries (see Fig. [T]left): 

X k >max{X u --- ,X fc _i}, k<N. (1) 

One can similarly define a lower record which is such that Xk < 
min{Xi, ■ ■ • ,X fe _x}. 




Fig. 1. Left: One realization of TV = 24 random variables Xj's, for which the number of 
records (the black dots) is R24 = 6. Right: Order statistics of N = 7 random variables. 
Mfc 7 denotes the fc-th maximum of the sequence. 

In the following, we will focus on upper records {jTJ) , which we will sim- 
ply call "records". Let Rn be the number of records ([T]) among these N 
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random variables. We first discuss a straightforward method, based on in- 
dicator variables, to investigate the statistics of Rn- Then we discuss more 
complicated joint probability distributions of the number and the ages of 
the records. This second method is not only useful to investigate the age 
of the largest and smallest record but can be generalized, with some appro- 
priate modifications, to the study of the records of random walks. 

2.1.1. Distribution of the number of records 

To study this quantity it is useful to introduce indicator variables o~k 's which 
take the value or 1: 

j 1 if x k is a record , ^ 
Cfc = < , R N = ) cr k ■ (2) 

[ if x k is NOT a record ~ 

For i.i.d. random variables, these indicator functions oVs are independent. 
We define 

(o-k) = r k , (3) 

where the average is taken over the different realizations of the random 
variables X%, ■ ■ ■ ,Xjj: r k is thus the rate at which a record is broken, at 
"time" k. For i.i.d. random variables, it is straightforward to compute 
the record rate as it is precisely the probability that the event in Eq. ([I]) 
happens. This yields 



Tk = / P(x) 



p{y)dy 



fc-1 .1 . 

dx = I u k ~ 1 du = - , (4) 



(i 



where we have used the change of variable u = p(y)dy. This result r k = 
1/k (jlj, independently of the parent distribution, can be easily understood: 
the probability that Xk is the maximum among Xi , • • ■ , X^ is indeed 1/fc 
as the maximal value can be realized with equal probability by any of these 
k i.i.d. random variables. From we get the mean number of records as 

N N 

(R N )=J2r k = J2k =HN ' (5) 

k=l k=l 

where Hn denotes the iV-th Harmonic number. For large N, it behaves as 



(R N ) =logN + lE + 0{N- 1 ) , 



(6) 
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where je = 0.57721 • • • is the Euler constant. Similarly, the second moment 
can be evaluated using indicators variables as 

k=l k=l 

= lo g N + lE ~^+0(l/N) , (7) 
6 

where we have used, in the first line of Eq. (O, that the cr^'s are inde- 
pendent. Similarly, one can compute the generating function (GF) of the 
probability distribution P{M\N) = ¥>(R N = M) using (for N > 1) 



oo N N , 

£ P(M\N)x M = (x R ») = Hix**) = J[ \ ^- + 1 

M=l fc=l k=l 

x(x + 1) • • • (x + N - 1) 



(8) 



AH 

One recognizes that the rising factorial appearing in ([5]) is the GF of the 
unsigned Stirling numbers of the first kincP^ 

N 



x(x + 1) • • • {x + N - 1) = E 



M=l 



X 



(9) 



where the unsigned Stirling numbers [^] enumerate the number of permu- 
tations of N elements with M disjoint cycles exactly. Hence one has 

P(M\N) = S. , (io) 

which thus shows that the number of records of N i.i.d. random variables 
is distributed like the number of cycles in random permutations of N ob- 
jects with uniform measure. We will come back later, in section 12.1.51 to 
this connection with random permutations. Finally, using the asymptotic 
behaviors of Stirling numbers, one can show that the distribution of Rn 
approaches, when N — > oo, a Gaussian distribution 

P(M\N) ~ - 1 exp (J"-***) 2 ) . 
v 1 ; V2Fk3gTV y \ 21ogiV J y ' 

Here we have discussed the case where the random variables Xi's are 
continuous random variables. We refer the reader to RefPS for a discus- 
sion of the effects of discreteness, in particular when continuous random 
variables are subsequently discrctized by rounding to integer multiples of a 
discretization scale. 
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2.1.2. Joint distribution of the ages of records and of their number 

Let us consider a realization of these N i.i.d. random variables iCj's, 
which we consider as a time series, the index i playing the role of dis- 
crete time. Let M be the number records in this realization. We denote by 
I = (lii ^2) " " • j Im) the time intervals between successive records as depicted 
in Fig. [T] Thus li is the age of the i-th record, i. e. it denotes the time 
up to which the z-th record survives. Note that the last record, the M-th 
record in this sequence, still stays a record at "time" N . We first compute 
the joint probability distribution P(l,M\N) of the ages I and the number 
M of records, given the length N of the sequence. This joint pdf can be 
written as 

- Im— 1 

p(x)dx 
n /fc-1 

•^EiLiWV' ( 12 ) 



!JM 



P(l,M\N)= / dy M p(y M ) 

J — oo 

1/-1 j-y k + 1 r /-yd 

/ d VkP{Vk) I p{x)dx 

p. i J-oo .J —oo 



X 

fc=l 



where the delta function in (fT2| ensures that the size of the sample is N. If 
one performs the change of variables Uk = J^ k p(x)dx, the pdf P(l,M\N) 
in (fT2l) can be written as 



P(l,M\N)= f duMU 1 ^ 1 J] / ' dUkU^S^u^j, . (13) 
Jo k=1 Jo 

This multiple integral in (|13|) can be performed straightforwardly to obtain 

«l(tl + <2j(«l + «2 H r-'A/J Z " fe=1 ' 

Eq. (|14[) carries more information than just the number of records Rn- 
In fact, as we show below, this result in ([T^| can be conveniently used to 
compute the statistics of the age of longest and shortest records. 



2.1.3. Distribution of the age of the longest record 

We now focus on the age of the longest record, denoted by l ma x,N, which is 
defined as 

^max.jv = max{Zi, l 2 , • • • , Im} ■ (15) 

Its cumulative distribution F(l\N) = P(Z m ax,JV < 0> ^ — 1> ^ s obtained from 
the full joint pdf (|14p by summing over M and h, - ■ ■ Im with the constraint 
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that ii < I, ■ ■ ■ , Im < I. It reads 

00 ' ' 6 m 

m\N)= e e - e M;i+M . E ( r+;r + --- + /M) ' (16) 

M=l/i=l ZA f =l V ; V ; 

while F(Z|0) = 1. The GF of F(l\N) with respect to (wrt) N is conveniently 
written using the integral representation of the pdf in (|1 3[) as 



oo ,.\ M-l p Uk+1 

Y,z N F(l\N) = l+J2 / du M f(uM)Y[ / du k f(u k ) , (17) 

m— 1 

The multiple integral in (1171) can be performed by induction in terms of the 
integral of f(u) 

g(u) = / f(v)dv = £ , (19) 

yielding finally 

E^(W = 1+E^T=-p(E^- (20) 

W=0 M=l ' \k=l / 



From the GF of the full distribution of l-ma-^.N (|20|) one obtains the GF of 
the average value (? ma x,iv) = YaLiQ ~ as 



OO oo 

E ( l m a x,N)z N = Y~, E 
JV=0 (=1 



1-exp - £ y 



(21) 



By analysing this expression ([211) in the limit z —> 1, where the discrete 
sums can be replaced by integrals (setting z — e~ s ) one obtains the large 
N behavior of (i max ,jv) a s 

r°° 

(/ max ,JV> = c x N + 0(1) , c x = / &(! - e~ J* e «-)= 0.62432... ,(22) 



where ci is the Golomb-Dickman or Goncharov constant!^ This constant 
ci also describes the linear growth of the longest cycle of a random permu- 
tation!^ This constant also appeared in a model of growing networlfS2l anc j 
in a one dimensional ballistic aggregation model.^ 
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2.1.4. Distribution of the age of the shortest record 

We now focus on the age of the shortest record, denoted as Z m in,7V, which 
is defined as 

lmin,N = min{/i, l 2 , ■ ■ ■ ,1m} ■ (23) 

We define G(l\N) = P(Z m in,iv > I), I > 1, and using the same reasoning as 
above for l max ,N we find the GF of G(l\N) = P(/mi n ,jv > i) wrt N as 



J2 G(l\N)z N = exp 



^ k 

.k=l 



1 . (24) 



The GF of the average value (imin.iv) = YliLi ^('I^O can be obtained 
from (|24|) which yields the asymptotic result for large 

(^min.Ar) - ^ log N + (log N) , (25) 

with the numerical value e~ 7E = 0.5614594835.... 
2.1.5. Connection with random permutations 

As we have seen repeatedly in this section, records statistics bear strong 
similarities with the statistics of random permutations. The existence of 
connections between the two fields is well know n 53 ! I and they recently 
showed up in various problems of statistical physics t 47 ! 52 ! One of the main 
manifestation of this connection is that the number of records Rn for N 

1.1. d. random variables is distributed like the number of cycles in random 
permutations of N objects with uniform measure (fTO)) . We refer the inter- 
ested reader to RefP^ for a more complete discussion of this connection. 

2.2. Record statistics of random walks 

We now study the record statistics of a discrete-time random walker (RW) 
moving on a continuous line. The position of the RW xu after k steps 
evolves via 

x k = Xfc_i + i] k , (26) 

starting from xo — and where the jump variables Ty^'s are i.i.d. variables, 
drawn from a distribution <p(j]) . Here we study the record statistics of a 
realization of this RW (|2l>| after N steps, xo,x± ■ ■ ■ ,xn (there are thus 
N + 1 random variables in this sequence). As before (JTJ, a record is broken 
after k steps if x k > maxjxo, £i, • • • ,Xk-i}, with the convention that xq 
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is counted as a record. As in the case of i.i.d. variables, we will focus on 
the number of records Rn as well as on the age of the largest, Z maXj Ar, and 
shortest l m - ln .N record. As discussed before in the case of i.i.d. random 
variables, the statistics of these quantities are conveniently obtained from 
the joint probability distribution P(l, M\N) of the ages and the number of 
records after N time steps. The ages k's are thus defined as the number of 
steps between two records, hence as in the i.i.d. case (see Fig. [TJ) except 
that l[{ — > In — 1 (in Fig. [TJone would thus have Iq — 5 for a random walk). 

To compute this joint distribution P(Z, M\N) we need two quantities as 
inputs.^ The first one is the probability q~(l) that a RW, starting in xq, 
stays below xq after I time steps: 

q-{l) = ¥[x k < x , VI < k < I] . (27) 

Due to translational invariance, this probability does not depend on xq and 
we can thus set xo = 0. Its GF, gL(z), is given by the generalized Sparre 
Andersen (SA) theoremPsl 

OO OO 

q_ (z) = J2q- ( fc ) zfe = cx p J2 T p{Xk < 0) 

fc=0 Lk=l 

The second quantity is the first passage /_(£) that the RW crosses its 
starting point xq between steps (I — 1) and I from below xq- Again, /_ (Z) 
is independent of xq and one can set xq = 0. It follows from its definition 
that f-(l) = q~(l — 1) — q~(l) so that its GF can be expressed as 

oo 

/-(*) = E f~( l > 1 = 1 - (1 - ■ (29) 

1=1 

Armed with these two quantities g_ (I) and /_ (I) we can then write down 
explicitly the joint distribution of the ages Z and the number of records M, 
P(l,M\N): 

P(l,M\N) = /_(Ji)/_(l 2 )--- f-Q M -i)q-(iM)^ ilhttf , (30) 

where we have used the Markov property of the RW which implies that 
the intervals Zfc's are statistically independent, except for an overall global 
constraint that total length of the interval is N, which is incorporated by the 
delta function. Note that since the number of records is M, the last interval 
Im is not terminated and its pdf is thus g_(Zjw) instead of /_(Zm)- This 
exact expression ([50)1 . together with (|28p and (|2"9"]l is the starting point of 
the analysis of record statistics of RW.^It is the analogous to the expression 
in (|T4|) obtained in the i.i.d. case. 



(28) 
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2.2.1. Record statistics of a single symmetric random walk 

Continous jump distribution. We first consider the case of symmetric 
jump distributions, such that 4>{n) — 4>(—rf) and focus, for the moment, 
on the case where </>(??) is continuous (the case of lattice RW, with discrete 
jump distribution, will be discussed below). In this case the SA result (|28|) 
becomes, thanks to the fact that P(xfc < 0) = 1/2 for any k > 1: 

g_(») = q(s) = — L= => g_(Z) = q(l) = ^ ± , (31) 

independently of the jump distribution 4>(rf). For large I, one has from (l3~Tj) 

9(0 ~ 4=f (32) 
On the other hand, from (|30[) one gets the GF of (Rn) as 

EW^ = 71 ^TV ( 33 ) 

from which one gets [using (|31l) ]: 

= f] = (2iV + l)P^2^ ~ AViV + 0(A^).(34) 



^\kJ2 2k v \ A 

fc=0 



'7T 



It was demonstrated recently that this square root growth oc v AT is robust 
and remains the same in presence of measurements errors and noised Note 
that from the SA theorem q(l) and f-(l) = /(/) = q(l - 1) - q(l) are 
universal, i.e. independent of the jump distribution: hence the full joint 
distribution P(l,M\N) in (|30|) and any of its marginals are also universal. 

Let us first consider the probability distribution of the number of records 
P(M\N) = ¥(R N = M) = J2fPQ,M\N). From flUD one obtains straight- 
forwardly^ 



— (i - VT^Y 1 - 1 



^ P(M[iV)^ = [/(z)] M - 1 g(z) = ^ , (35) 

AT=Af-l 



where we have used (|29|) and ([31]) . From (f35j) it is possible to obtain the 
full distributional 

P(M\N) = (^ N ~^ + ^ 2 - 2N+M - 1 , M < N + 1 . (36) 
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From (|36p we can obtain the mean as in Q34p and the variance, which for 
large N behaves like 

(R 2 N )~(R N ) 2 = 2(l-^jN + 0(^N). (37) 

It is interesting to compare these results for the RW sequence with that 
of i.i.d. random variables studied above. In particular, for i.i.d. variables, 
the fluctuations of Rn © are small compared to the mean (j6]) for large N. 
In contrast, for the RW sequence, it follows from and (|3"T|) that both 
the mean and the standard deviation grow as y/~N for N 3> 1: thus the 
fluctuations are large and actually comparable to the mean. This suggests 
that in the random walk case, at variance with the case of i.i.d. random 
variables ([XT]) , the probability distribution F(M\N) takes the scaling form 
F(M\N) ~ (v A /V)" 1 5o(M/ v / iV). This can actually be shown fr om the 
analysis of (|36|) for large N, which yields^ 

, , x 1 f M \ , , 1 * 2 

P(M\N) ~ -j=g Q -t= , ffoW = ^e~~ , x > . (38) 
ViV WNJ V 71 " 



What can be said about the statistics of the ages of the records ? The 
typical age of record / tyP can be estimated as ^ t y P ~ N/(Rn), which, from 
(|34| . thus grows like lt yp ~ yjA/ir^/N . There are however rare records 
whose age behaves quite differently with N. As was done before in the case 
of i.i.d. variables we consider the longest lasting record l maXl jsr in (115[) and 
the shortest duration l m i n ,N in (|23|) . 

We first consider the statistics of i roax ,iv = max{ /i, • • • Jm} and com- 
pute the cumulative distribution F(l\N) = P(^ m ax,jv < 0- As was done 
before, it can be computed from the full joint pdf P(l,M\N) in (|30|) by 
summing up over ^ < I and summing up over M. One can thus compute 
the GF of F(l\N) wrt to N aP 

Y F(l\N)z N = £fc=ig( fc )** . (39) 

One can extract, in principle, the expression of F{l\N) from this for- 
mula (|39[) . In particular, the asymptotic large N behavior of the average 
(^max.iv) = YmLA^- ~~ FQ>\N)] can be extracted explicitly 



(^max,jv) ~ c 2 N , c 2 = 2 / dy log 



l + ^7=r(-l/2,y) 



0.626508.. 



(40) 

Thus the age of the longest record (oc N) is much larger than the typical 
age (oc y/N). Interestingly, the constant C2, for symmetric random walks 
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(HO]) is quite close to the Golomb-Dickman or Goncharov's constant ci (|2"2"|) 
which characterizes the age of the longest record of a i.i.d. sequence. Note 
however that the origin of universality is quite different in the two prob- 
lems. Interestingly, the same constant C2 appears in the excursion theory 
of Brownian motion.^ The precise link between these two problems was 
shown in RefP^ 

For the shortest lasting record l m i n _N — min{/i, • • • , Im}, it is also useful 
to consider the cumulative distribution G(l\N) — P(imin,JV < 0- Its GF wrt 
N is easily obtained from the joint pdf (f3"U| as: 

(41) 

In particular, one can extract from (|4lT) the large TV behavior of (l m i n n) 
aP 

(Uin,N) ~ , (42) 

which grows in a similar way as that of the typical record, albeit with a 
smaller prefactor \j \pK = 0.56419 • • • compared with \pn]\ — 0.88629 
Notice also that it grows much faster (oc vN) than in the case of i.i.d. 
random variables (oc log N) (|25|) . 

Discrete lattice random walks. The above analysis can also be per- 
formed for discrete lattice random walks, corresponding to <j>(ji) = ^S(r] + 
1) + ^S(r) — 1) in ([26]) . except that in this case the expression of q(z) is 
different from (|31|) for symmetric jump distributions [this can be seen from 
Eq. ([H as P(x k = 0) ^ in this case]. One has then 



q(z) = q-{z) = - —=> q (l) ~ — , (43) 

1 — Z Z[l — Z) l^OO y/ftl 



which differs, by a factor of \/2 from (|32|) for the continuous case. In Ref.;- 
it was shown that (Rn) ~ y^NpK, which is l/\/2 of the expression for the 
mean in the continuous case ([Ml) . As shown in Ref.pSlthe number of records 
Rn is, in this discrete case, directly related to the maximum of the sequence 
up to step N, Mff = max (xo, • • • , xn), via the relation Rn = Mjv + 1. This 
allows to compute the full distribution of Rn and show^that for large N, it 
takes the scaling form as in Eq. (f3"5)) . with go(x) = y^2/ire~ x I 2 . Finally, in 
Ref.Pit was also found that (Z ma x,iv) ~ C2N and (/ m i n ,jv) ~ \j2N/it which 
are respectively equal to, and y/2 times, the corresponding expressions for 
the continuous case. 
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2.2.2. Record statistics of a single random walk with a drift 

Up to now, we have discussed the case of symmetric RWs, where the jump 
length distribution (f>(r)) is continuous and symmetric, 4>(rj) — <f)(—r)). How- 
ever, the renewal equation for the joint pdf P(l,M\N) in (|30p . as well as 
the generalized SA result (|28p. are still valid for continuous but asymmetric 
jump distribution. The only difference is that we have to use the appro- 
priate expressions for q~(l) (|2"5|) and f-(l) (|2U)) instead of q(l) and f(l) in 
the above expressions for (Rn) (J33J) and for the distributions of l ma x,N (p9|) 
and l min , N (grj). 

In particular, one can study the case of a biased random walk which is 
constructed from the symmetric random walk x k in (|26f) as 

Vk=Xk + ck =4> y k = y k -i + c + r/ k , (44) 

where ry^'s are i.i.d. variables, drawn from a distribution <p(rj): y k thus rep- 
resents the position of a discrete-time random walker at step k in presence 
of a constant drift c. In Ref.pl the authors studied in detail the special 
case of the Cauchy jump density, f/>Cauchy(^) = 1/[tt(1 + if)] with arbitrary 
drift c. In particular it was found that the mean number of records depends 
algebraically on N with a continuously varying exponent 0(cf^ 

(R N ) ~ N s{c) , 9(c) = \ + iarctan(c) . (45) 

2 TT 

On the other hand, the mean number of records (Rn) for jump densi- 
ties with a finite second moment rj 2 and positive drift c > was studied 
in Ref.jES using various approximation schemes. In RefPD the authors 
studied the record statistics of such a biased random walk (PHI) for arbi- 
trary continuous jump distribution 4>(rj) such that its Fourier transform 
4>(q) = 4>(r])e lqTI drj behaves, for small q, as 

ki) = i - Mr + o(i«i M ) > (46) 

where < /i < 2 and Z„ is a typical length scale of the jump. The exponent 
fj, controls the large \t]\ tail of 4>(rj). For jump density with a well defined 
second moment a 2 = drjrj 2 4>(r]) one has evidently \x = 2, while for 
\x e (0,2), 0(?7) ~ for large \r\\. The record statistics of such RW 

(|4"4"|) for any value of /i G (0,2] (|4^|) and any drift c was performed in RefP^ 
The analysis performed relied on a detailed study of the behavior of the 
persistence probability q~(l) which was found to be very sensitive to these 
parameters fi and c. This study^ revealed the existence of five distinct 
regions in the (c, < /1 < 2) strip where Rn, (/max,jv) and (Z m ax,jv) exhibit 
very different behaviors. These results are summarized in Table 1. 
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Tabic 1. Summary of the main results for the records of RWs with a drift 1144 jl . from 
RefES The constant C2 is given in (140 D and the exponent 9(c) is given H45II , while the 
constants < a M (e) < 1 and < Cn < 1 are non-universal constants given inPH 
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2.2.3. Record Statistics for Multiple Random Walks 

We conclude this section on records by mentioning results for the records 
of n symmetric independent RWs, which were obtained inP^l At each 
time step, each walker jumps by a random length drawn independently 
from a symmetric and continuous distribution, as in (|26j) . Two cases were 
considered: (I) when the variance a 2 of the jump distribution is finite 
and (II) when a 2 is divergent as in the case of Levy flights with index 
< /i < 2 (J46J) . In both cases it was found that the mean record number 
(RN,n) grows universally as ~ a n yN for large N, but with a very different 
behavior of the amplitude a n for n > 1 in the two cases. Indeed it was 
shown that, for large n, a n ~ 2y / logn independently of a 2 in case I while, 
in case II, the amplitude approaches to an n-independent constant for large 
n, a n A/y/n, independently of < \i < 2. For finite a 2 it was argued, 
and this was confirmed by numerical simulations, that the full distribution 
of (Rn,u/VN — 2y/\ogn)y/\ogn converges to a Gumbel law [as in Eq. (|5tj|) 
below] as N — > oo and n — > oo. In case II, numerical simulations indicated 
that the distribution of Rn.u/^/N converges, for N — > oo and n — > oo, 
to a universal nontrivial distribution, independently of /i, the computation 
of which remains an open problem. RefP^ also discussed the applications 
of these results on records for multiple random walks to the study of the 
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record statistics of 366 daily stock prices from the Standard &: Poors 500 
index. 



3. Order statistics 

3.1. Order statistics of i.i.d. random variables 

Let us first review the standard results for order statistics of i.i.d. random 
variables. We refer the reader to classical textbook on the subject for 
more details (see also RefP^ for a review). We denote by Xx,X%, ■ ■ • , Xjy 
a collection of N i.i.d. random variables, distributed according to a prob- 
ability density function (pdf) p(x). We denote their common cumulative 
distribution by P(x) = p(y)dy. We define the N order statistics of this 
sequence by arranging the values of Xi by decreasing order of magnitude 
(see Fig. Q] right) 

X max = M hN > M 2 , N >■■■> M NtN = X min , (47) 

where we denote by X max and X m ; n the maximum and the minimum among 
the Xj's. 

For i.i.d. random variables, it is possible to write down explicitly the 
full joint distribution pjv(mi, • • • , mjv) of M^at, ■ ■ • , M/v.jv- To compute 
it, we first note that, given the realizations of the N order statistics to be 
mi > rri2 • • • > wjv, the original variables XiS are constrained to take on 
the values m, (i = 1, 2, • • • , N). On the other hand, by symmetry, each of 
the N\ permutations of the Xj's are assigned the same weight. Hence we 
have™ 

N N-l 

p N (mx,- ■ ■ ,m N ) = NlY\_p(mi) 6(m t - m»+i) , (48) 

i=l i=l 

where the product of 9 functions ensures the ordering of the variables (we 
remind that 6{x) — 1 if x > while 6(x) = if x < 0). From this expression 
(|48p one can get, in principle, any characteristic of order statistics of i.i.d. 
random variables. Here, in addition to the distribution of Mk,N we study 
the gap between two successive maxima (see Fig. Q] right) 

d k ,N = M kiN - M k+hN , (49) 



which is an interesting characteristic of the crowding near extreme 
eventsPSD 
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3.1.1. Finite sample 

We first focus on the pdf /^(m) = d, n P(M k ^ < fri) which can be obtained 
from the full joint pdf (|48|) by integrating over m\ • • • , mk-i, ™fc+i, ■ • • , mN 
such that mi > • • • > rn,k-i > m > m k+ \ > • • ■ > m^: 

/•oo k—2 /*oo 

fk,N(m) = N\p{m) / dm k -ip{m k -i) / dmjp(m,j) 

Jm j =1 Jmj+i 

dm k+1 Yl / dmjp(m,j) , (50) 

-°° j=k+2 J -°° 

where P{m) = f p(m')dm' . It is then straightforward to check (for 
instance by induction) that f kt N in (|50p can be written as: 

fkMm) = (k-i)\(N-ky. p{m) [p{m)]N ~ k [1 - p{m)]k ^ ■ (51) 

Note that this formula (|5"Tj) can also be directly obtained by noticing that 
the event that m < M kt n < m + dm is the same as the following event 
where Xi > m + dm for (k — 1) of the -Xj's, m < Xi < m + dm for exactly 
one of the X^s and Xi < m for the remaining N — k Xi's. In particular for 
k = 1 one obtains from (|5ip the pdf of X max as 

/i,jv = ^F(^max < m) = iVp(m) [^(m)]^- 1 . (52) 

Similarly the pdf of X m - m is obtained by setting k — N in (|5T|) . 

From the full joint pdf in (|48|) it is also possible to obtain the joint 
pdf of Mj t N and M k ,N and eventually the pdf Pk,N(d) of the gap d k ,N = 
M k ,N - M k+ x <N as 

Pk,N{d) = 0(d) ^ _ - _ _ J dm k p(m k )p{m k - d) (53) 

x [P(m k - d)f- k - 1 [1 - P(m fc )] fc_1 ■ 

As an example, for the case of exponential i.i.d. random variables X^s, 
such that p(x) = 9(x)e~ x , one finds from f|53[) 

p k , N (d) = 9(d)ke- kd , (54) 



independently of N. 
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3.1.2. Asymptotic results for large samples 

We now turn to the analysis of these results for i.i.d. variables in the limit of 
large samples, where N is large. We first focus on Fi^{m) = P(X max < m) 
which is known to exhibit a universal behavior when N — > oo. Indeed, one 
can show that there exist constants «jv and 6 at and three distinct families 
of distributions G p (z), p = I, II, III such that 

lim F liN (a N + b N z) -> GJz) , p = I, II, III , (55) 

N— >oo 

where the limiting distribution G p (z), depends only the large argument 
of the parent distribution p(x). The large N behavior of extreme value 
statistics (EVS) of i.i.d. variables is thus characterized by three distinct 
universality classes: (I) Gumbel, (II) Frechet and (III) Weibull. 

The Gumbel universality class. In this case, the support of p(x) 
might be bounded or unbounded - though the later is the most commonly 
encountered. In that case, the Gumbel universality class corresponds to 
the case where p{x) decays faster than any power law, p(x) <C x~ v , for any 
value of 7] > and G\{z) is given by a double exponential 

Gi(z) = exp [- cxp (-z)] , (56) 

the so called Gumbel distribution. The constant a at is given by the standard 
relation of EVS 



1 - P(a N ) = / p(x)dx = - , (57) 

J a N 



which simply says that there is typically one single variable, the maximum, 
in the interval [ojv, +oo). On the other hand &at is given by the relation 

f°° (x — ajsi)p{x)dx 
bN = Jaw Voo , (58) 

Ja N P( X ) dx 

which can be interpreted as the typical distance between X max and a at, con- 
ditioned to the fact that there is a single variable in [a at, +oo). The Gumbel 
universality class corresponds to the case where, for instance, p{x) is an ex- 
ponential or a Gaussian distribution. But this also corresponds to the case 
where p(x) is defined on a bounded support, for instance x € [0, 1) where 
p(x) exhibits an essential singularity in x = 1, p(x) <~ exp [—1/(1 — x)"], 
with v > 0. 

The Frechet universality class. This class corresponds to the case 
where the support of p(x) is unbounded and where p(x) has a power law 
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tail p(x) oc with a > 0. In this case the limiting distribution G\\{z) 

is given by 

Gn(z) =9(z)exp[-z- a ] . (59) 
Besides, one has in this case ajy = while b^ is given by 

1 - P(b N ) = f p(x)dx = 1 , (60) 

Jb N iV 

from which one gets in particular that 6jv oc iV«. This situation corre- 
sponds to the case where p{x) is, for instance, a Cauchy distribution or a 
Pareto distribution. 

The Weibull universality class. This corresponds to the situation 
where the support of p(x) is bounded from above, such that p(x) = if 
x > x* and p(x) behaves when x approaches x* as p(x) oc (x* — x) a ~ l , 
a > 0. In this case the limiting distribution Gm(z) is given by 

Gui(z) = { 1 ' Z > ° ' (61) 

In this third case, one has naturally a at = x* while 6 at is given by 

p(x)dx = 1 , (62) 

from which one gets that fe/v oc N~i . This universality class includes, for 
instance, the case where p{x) is a uniform distribution, p(x) = 9(x)9(l — x) 
(and in this case a = 1). 

One can now study the limiting behavior of the distribution of the fc-th 
maximum F^ n {m). In this case, depending on the parent distribution p{x), 
which might belong to one of the three aforementioned universality classes, 
p = 1, 2, 3, one can show thalPS 

lim F KN {a N + b N z) = G p {z) V ^ lo S G p( z )f > (63) 



N-too *■ — ' ?! 

3=0 



1 



e-H n - L dt , (64) 

\nG p (z)] 



(k-l)\ 

where G p , with p = 1,2,3, is one of the three limiting distributions men- 
tioned above in Eqs. <j56j), ([59]) or ([BTjl. 

A more complete result can also be obtained for the full asymptotic 
distribution of the vector of the k first maxima^^ 
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where the joint pdf of W%, • • • , Wk is given by 

• • ■ , w fe ) = G p {w k ) TT „ P ' , toi > ■ ■ • > lOfc , (66) 
f = \ G P {wi) 

where g p (z) = G'(z). This expression (1661) is already well known. From 
it we derive the expression for the limiting distribution of the fc-th gap 
dk,N = Mk.N — Mk+i,N, which we have not seen in the literature before. It 
reads: 

Pk,N(d) ~ T^-Pgap.p ( 7— ) > ( 67 ) 

on \ on J 

where p s&p , p {d) is given by 

*>„,(«) - ^ /)*)t^i-^^^-)i" * ■ <«) 

In particular, for the Gumbel universality class, one finds simply 

PenM = 0(d)ke~ kd . (69) 
For the Frechet universality class, i.e. p = 2, one finds from (|6"5)) : 

P sa . P ,ii(d)=0(d)j^^J o x- a -\x + &T ak - x te. (70) 

In particular for large d, it behaves like 

For a = 1, the above integral (1701) can be explicitly evaluated 

Pga P ,n(d) - 0(d)fc(A + l)d- x - fc U(fc + 1, 0, 1/d) , (72) 

where U(a, 6, z) is the confluent (Tricomi) hypergeometric function, which 
is consistent, for large d with (fTTj) for a = 1. 

Finally, for the Weibull universality class, one finds 

PgBp,m(d) = 0(d) 7? TT7 / + d) -^"^ i *" 1 , (73) 

which for large d behaves like 

p ga p,m(d) ~ _^ a -*°r(aA0d (1 - Q)(afc - 2) e- d ° . (74) 
For a = 1, this expression (|73p simplifies to yield simply 

,,ni(d) = c?(d)e- d . (75) 
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3.2. Order statistics of random walks 

As we have seen, the order statistics of i.i.d. random variables is fully 
understood, thanks in particular to the identification of three different uni- 
versality classes. In this section, we present recent results for the order 
statistics of random walks, which offer a non-trivial instance of a set of 
strongly correlated variables where exact results can be obtained. We will 
see that the results arc quite different from the i.i.d. case. 

We thus consider a RW which starts at xq = at time and evolves via 
(|26| where the 77^ 's are i.i.d. random jumps each drawn from a symmet- 
ric distribution 4>(rf). We study the fluctuations of the ordered sequence 
Mi,n > M2.N > ■ ■ • > Mjv+i,jv where Mk,N is the fc-th maximum of 
the RW after N time steps, hence k = 1, • • • ,N+1. The study of order 
statistics for random walks, beyond the first maximum X max = Mi^, was 
initiated recently in Refill for the case where the jump distribution <fi(r]) 
has a well defined second moment a 2 . In this case, the RW converges, in 
the limit of a large number of steps TV, to the Brownian motion. In Ref.J^ 
it was shown in this case that when N — > 00 

^Ml = J™ + 0(l), (76) 
a V 7r 

independently of k. Thus the property of the crowding of extremum 
(^-dependence) is not captured by the statistics of the maxima Mk,N 
themselves, at least to leading order for large N. The simplest ob- 
servable that is sensitive to the crowding phenomenon is thus the gap, 
dk,N = Mk,N — Mk+i,N- The main result of RefP^ is to show that the 
statistics of the scaled gap dk^/cr becomes stationary, i.e., independent of 
TV for large TV, but retains a rich, nontrivial k dependence which becomes 
universal for large k, i.e. independent of the details of the jump distribution 
0(7?). 

In particular, using the so called Pollaczek-Wendel identity f 60 ! 61 ! the 
stationary mean gap dk = (dk,oo) was computed exactly for all k and for 
arbitrary ^(77) [whose Fourier transform is denoted by ^(q)p' 



N a T(fc+i) 1 f°°dq 

{dk, oc 



V2^r(/c + l) 7rfc J q 
In the limit of large k, one finds from (|77|) that 



1 



2\k 



(77) 
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independently of (j){rj). This k~~ x / 2 dependence in d~k l[78]l was actually 
noticed in the numerical study of periodic random walks in RefP^ and was 
also conjectured to be exact, based on scaling arguments. 

It is natural to wonder about the full distribution of the stationary 
gap, not only its first moment (|77|) . In Ref.f^ this full pdf pk{$)d5 = 
P(rffe,oo G [<5, (5 H~ dS)) was computed exactly, using backward Fokker-Planck 
techniques,^! for one particular case of a jump variables with an exponential 
distribution qb(rf) = b^ 1 exp (— \q\/b). In the limit of large k, it was shown 
that there is a scaling regime when S ~ (dk,oo) — cr/V^nk where the pdf 
scales as, Pk(S) — (Vk/a)P(5vk/a), with a nontrivial scaling function 

P(x) = 4[y^(l + 2x 2 ) - e 2x \{Ax 2 + 3)erfc(%/2x)] , (79) 

where erfc(z) = (2j\pn) e~* dt is the complementary error function. 
While it was not possible to compute the gap pdf for arbitrary (f>(r)), nu- 
merical simulations^ provided strong evidence that the scaling function 
P(x) in Eq. (f79| is actually universal, i.e., independent of 0(77). Somewhat 
unexpectedly, we find that this universal scaling function has an algebraic 
tail P(x) ~ x~ A for large x. For S ^> (rffe,oo) — a/V^hrk, the pdf gets cut-off 
in a nonuniversal fashion. Thus there are two scales associated to dk l0 o'- & 
typical fluctuation which is universal and large fluctuations which are non- 
universal. This is shown to have interesting consequences for the moments 
of the stationary gap: (d^ N ) ~ for p < 3, while (d? N ) ~ fc _ 2 for 
p > 3. 

We end up this section on order statistics of RW by mentioning that 
exact results have been recently obtained, using first-passage techniques, for 
the joint distribution Pj^(g,l) of the first gap di t N — Gn = Mi.n — M 2 ,n 
and the time Ljy = n\ — ri2 between the occurrence of these first two 
maxima.^! This analysis was carried out for any value of the Levy index 
< /i < 2 (|46l) . In particular, it was shown that P/v(5,Z) converges to a 
stationary distribution, i.e. independent of N for large N, which displays 
a very rich behavior as a function of g and I as /x is varied. 

4. Conclusion 

To conclude, after a brief review on records and order statistics for i.i.d. 
random variables, we have presented the main results which were recently 
obtained for records and order statistics of RW, using first-passage concepts. 
A striking feature of these statistics for N i.i.d. random variables is their 
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universality with respect to their common parent distribution. For records, 
universality shows up, to a large extent, already for any finite N. This can 
be seen, for instance, through their connection with the statistics of random 
permutations. For extreme and order statistics, universality only appears 
in the (thermodynamical) limit of large N, thanks to the existence of three 
distinct universality classes (Gumbel, Frechet and Weibull). What is left of 
this universal behavior in the presence of strong correlations is an important 
question. Quite interestingly, for RW with symmetric and continuous jump 
distribution <f>(r)), the records statistics do not depend on the details of 4>(rj) 
(including Levy RW such that <fi(r)) ~ M _1_M with < fi < 2), even for a 
finite number of steps. This universality is due here to the Sparre Andersen 
theorem. In the presence of a drift c, universal behavior also emerges but 
only in the limit of a large number of steps N — > oo. However in this case, 
this asymptotic behavior depends on both c and the Levy index \i (see 
Table 1). 

On the other hand, order statistics of RW is quite sensitive to the jump 
distribution (/>(rj). For instance, the distribution of the gap dk,N for finite k 
and N is generically quite sensitive on (j>(rj). However, for large k and large 
N, a scaling regime was identified when dk,N ~ 1/ Vk where the fluctuations 
are universal and described by a universal scaling function (|79p. at least in 
the case where the jump distribution 4>(r]) has a finite second moment. The 
statement of the universality of this scaling regime is based on (i) exact 
calculation for the case of exponential jumps, (ii) numerical simulations. It 
will be interesting to establish this universal behavior on firmer grounds. 
Finally, it will be interesting to extend this study of records and order 
statistics to other stochastic processes, in particular non-Markovian ones. 
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